Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable "kick the tires" support for Nvidia GPUs in COS #45136

Merged
merged 3 commits into from
May 23, 2017

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Apr 29, 2017

This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -

  • /home/kubernetes/bin/nvidia/lib for libraries
  • /home/kubernetes/bin/nvidia/bin for debug utilities

Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using HostPath volumes.

Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.

This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:

  1. Driver installation requires disabling a kernel feature in COS.
  2. The kernel API for disabling this interface changed across COS versions
  3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
  4. Instead of having to post 3 PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.

Try out this PR

  1. Get Quota for GPUs in any region
  2. export KUBE_GCE_ZONE= KUBE_NODE_OS_DISTRIBUTION=gci
  3. NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh
  4. kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml
  5. Run your CUDA app in a pod.

Another option is to run a e2e manually to try out this PR

  1. Get Quota for GPUs in any region
  2. export KUBE_GCE_ZONE=<zone-with-gpus> KUBE_NODE_OS_DISTRIBUTION=gci
  3. NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"
  4. go run hack/e2e.go -- --up
  5. hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"
    The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.

TODO:

  • Update COS image version to m59 release.
  • Remove sleep from the install script and add it to the daemonset
  • Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
  • Setup a test project with necessary quota to run GPU tests against HEAD to start with Adding CI e2e for testing GPU support in Kubernetes test-infra#2759
  • Update node e2e serial configs to install nvidia drivers on COS by default

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Apr 29, 2017
@vishh vishh force-pushed the cos-nvidia-driver-install branch 4 times, most recently from b880ab3 to bfa122d Compare May 5, 2017 23:58
@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2017
@vishh vishh force-pushed the cos-nvidia-driver-install branch from 5eddeb4 to 2bc0629 Compare May 8, 2017 03:17
@vishh vishh changed the title WIP: Automated install of nvidia drivers in COS. Automated install of nvidia drivers in COS. May 10, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 10, 2017
@vishh vishh force-pushed the cos-nvidia-driver-install branch from bc47260 to 86c5ae4 Compare May 10, 2017 22:47
@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 11, 2017
@vishh vishh force-pushed the cos-nvidia-driver-install branch from 795840d to 35a6810 Compare May 13, 2017 23:29
@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 13, 2017
@vishh
Copy link
Contributor Author

vishh commented May 13, 2017

@Amey-D This PR works now. I see the e2e get's stuck at times on a new cluster trying to figure out if nodes have GPUs in it's capacity. I will try to get to the bottom of it soon.
Otherwise, this PR can be a good starting point. PTAL.

@mtaufen will you be able to review this patch?

@vishh vishh added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels May 13, 2017
@vishh vishh added this to the v1.7 milestone May 13, 2017
@vishh vishh changed the title Automated install of nvidia drivers in COS. Enable "kick the tires" support for Nvidia GPUs in COS May 15, 2017
@vishh vishh force-pushed the cos-nvidia-driver-install branch from 4daf007 to 6c4372d Compare May 15, 2017 20:02
@@ -1593,4 +1593,5 @@ else
fi
reset-motd
prepare-mounter-rootfs
modprobe configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For IKCONFIG? Note it is not listed in the GKE node image spec https://docs.google.com/document/d/1qmiJOuLYqjJZF-PTfn-xvTbLvzTgFw8gMcWavX7qiQ0/edit

So we may have to double check our images to ensure they are built with this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How are folks expected to discover the doc you posted? Can you add an entry for /proc/configz?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, Nvidia driver installation may not require configs on all distros. I'm not sure about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dchen1107 is the person to talk to about changing the node spec

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get -qq update
RUN apt-get install -qq -y pciutils gcc g++ git make dpkg-dev bc module-init-tools curl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe -qq implies -y

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet.

NVIDIA_DRIVER_PKG_NAME="NVIDIA-Linux-x86_64-375.26.run"

check_nvidia_device() {
lspci
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth grepping for NVIDIA devices here too, for less output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can optimize it by wrapping around a command, but I actually like verbose output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough


echo "Running the Nvidia driver installer ..."
if ! sh "${NVIDIA_DRIVER_PKG_NAME}" --kernel-source-path="${KERNEL_SRC_DIR}" --silent --accept-license --keep --log-file-name="${log_file_name}"; then
echo "Nvidia installer failed, log below:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also print where people can find the full log file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the command run by this script will be printed - set -x. So the tail command below will display the log file name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -23,7 +23,7 @@ genrule(
name = "bindata",
srcs = [
"//examples:sources",
"//test/images:sources",
"//cluster/gce/gci/nvidia-gpus:sources//test/images:sources",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you might have accidentally ended up with two srcs in the same string here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Fixing it. Good catch.

)

func makeCudaAdditionTestPod() *v1.Pod {
podName := testPodNamePrefix + string(uuid.NewUUID())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reason to prefer this over GenerateName?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point

return testPod
}

func isClusterRunningCOS(f *framework.Framework) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to skip the master here, since theoretically you could have a non-cos master with cos nodes, and the nodes are where you care about GPU (though I don't think this os image skew happens in practice today). One way to do this would be to skip unschedulable nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would unschedulable nodes have GPUs on them? Seems like an expensive unschedulable node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Masters are marked unschedulable, so skipping unschedulables will skip masters. All I'm saying is that this test only cares that COS is on the nodes, and doesn't have to fail if the master is non-COS.


func areGPUsAvailableOnAllSchedulableNodes(f *framework.Framework) bool {
framework.Logf("Getting list of Nodes from API server")
nodeList, err := f.ClientSet.Core().Nodes().List(metav1.ListOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could pass nodeList into these helper functions, instead of re-listing on every call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to re-list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok

IMAGE = $(REGISTRY)/cuda-vector-add

build:
docker build --pull -t $(IMAGE):$(TAG) .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider specifying arch as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'd like to do it later since I don't know the multi-arch story for CUDA yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

# limitations under the License.

TAG=v0.1
REGISTRY=gcr.io/google_containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider ?= for TAG and REGISTRY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. good idea.

@vishh vishh force-pushed the cos-nvidia-driver-install branch from 6c4372d to 1b022fb Compare May 16, 2017 03:39
Copy link
Contributor Author

@vishh vishh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments. PTAL.

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get -qq update
RUN apt-get install -qq -y pciutils gcc g++ git make dpkg-dev bc module-init-tools curl
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet.

# limitations under the License.

TAG=v0.1
REGISTRY=gcr.io/google_containers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. good idea.

all: container

container:
docker build --pull -t ${REGISTRY}/${IMAGE}:${TAG} .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently only meant for COS, which is restricted to amd64 on GCP. Do you think more code is useful?

NVIDIA_DRIVER_PKG_NAME="NVIDIA-Linux-x86_64-375.26.run"

check_nvidia_device() {
lspci
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can optimize it by wrapping around a command, but I actually like verbose output.


echo "Running the Nvidia driver installer ..."
if ! sh "${NVIDIA_DRIVER_PKG_NAME}" --kernel-source-path="${KERNEL_SRC_DIR}" --silent --accept-license --keep --log-file-name="${log_file_name}"; then
echo "Nvidia installer failed, log below:"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the command run by this script will be printed - set -x. So the tail command below will display the log file name.

@@ -23,7 +23,7 @@ genrule(
name = "bindata",
srcs = [
"//examples:sources",
"//test/images:sources",
"//cluster/gce/gci/nvidia-gpus:sources//test/images:sources",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Fixing it. Good catch.

)

func makeCudaAdditionTestPod() *v1.Pod {
podName := testPodNamePrefix + string(uuid.NewUUID())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging....

return testPod
}

func isClusterRunningCOS(f *framework.Framework) bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would unschedulable nodes have GPUs on them? Seems like an expensive unschedulable node.


func areGPUsAvailableOnAllSchedulableNodes(f *framework.Framework) bool {
framework.Logf("Getting list of Nodes from API server")
nodeList, err := f.ClientSet.Core().Nodes().List(metav1.ListOptions{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to re-list.

IMAGE = $(REGISTRY)/cuda-vector-add

build:
docker build --pull -t $(IMAGE):$(TAG) .
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'd like to do it later since I don't know the multi-arch story for CUDA yet.

@vishh vishh force-pushed the cos-nvidia-driver-install branch from 76c18a9 to 1968f78 Compare May 20, 2017 13:39
@vishh
Copy link
Contributor Author

vishh commented May 20, 2017

Changes to hack/* are minimal. Adding an approval label.

@vishh vishh added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2017
@vishh
Copy link
Contributor Author

vishh commented May 21, 2017

@k8s-bot unit test this

@vishh
Copy link
Contributor Author

vishh commented May 21, 2017

@madhusudancs @nikhiljindal can either of you help me understand why the federation e2es are failing for this PR?
I'm updating the master and node image project dynamically as part of cluster setup. I could not figure out if federation has an independent logic for bringing up clusters. Kubemark had an independent setup script and I fixed that.

@vishh
Copy link
Contributor Author

vishh commented May 21, 2017

@k8s-bot kops aws e2e test this

@vishh
Copy link
Contributor Author

vishh commented May 21, 2017

@k8s-bot pull-kubernetes-federation-e2e-gce test this

vishh added 2 commits May 20, 2017 21:17
…Optimized OS

Packaged the script as a docker container stored in gcr.io/google-containers
A daemonset deployment is included to make it easy to consume the installer
A cluster e2e has been added to test the installation daemonset along with verifying installation
by using a sample CUDA application.
Node e2e for GPUs updated to avoid running on nodes without GPU devices.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@vishh vishh force-pushed the cos-nvidia-driver-install branch from e106bd6 to 86b5edb Compare May 21, 2017 04:17
Signed-off-by: Vishnu kannan <vishnuk@google.com>
@madhusudancs
Copy link
Contributor

@vishh please ignore federation tests for now. We are actively debugging the problem. It is not "required" right now, so PRs can merge without it passing.

@vishh
Copy link
Contributor Author

vishh commented May 21, 2017

@madhusudancs Thanks for the tip.
@mtaufen Let's get this merged as soon as you get a chance to review.

# Otherwise, we respect whatever is set by the user.
MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-${GCI_VERSION}}
MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers}
DEFAULT_GCI_PROJECT=google-containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are changing the default version, why refer to google-containers project in context of gci here and elsewhere? why not just switch the projects to cos-cloud?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the defaults here too. Honestly I don't think it matters. All these hacks would hopefully disappear soon.
At this point, I want this PR to go in and then I can clean things up.

Copy link
Contributor

@mtaufen mtaufen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (one typo in a comment, but that's it)

# git checkout "tags/v${kernel_version_stripped}"
git checkout ${LAKITU_KERNEL_SHA1}

# Prepare kernel configu and source for modules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/configu/config

MASTER_IMAGE=${KUBE_GCE_MASTER_IMAGE:-${GCI_VERSION}}
MASTER_IMAGE_PROJECT=${KUBE_GCE_MASTER_PROJECT:-google-containers}
DEFAULT_GCI_PROJECT=google-containers
if [[ "${GCI_VERSION}" == "cos"* ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cos"* is a neat trick, will have to add that to my toolbox.

@mtaufen
Copy link
Contributor

mtaufen commented May 23, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 23, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mtaufen, vishh
We suggest the following additional approvers: gmarek, madhusudancs

Assign the PR to them by writing /assign @gmarek @madhusudancs in a comment when ready.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 23, 2017

@vishh: The following test(s) failed:

Test name Commit Details Rerun command
Jenkins unit/integration 1968f78c425e54889e3ac502dc154f623bf04ade link @k8s-bot unit test this
Jenkins kops AWS e2e 1968f78c425e54889e3ac502dc154f623bf04ade link @k8s-bot kops aws e2e test this
pull-kubernetes-federation-e2e-gce 333e571 link @k8s-bot pull-kubernetes-federation-e2e-gce test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 1e21058 into kubernetes:master May 23, 2017
@@ -49,21 +49,21 @@ images:
tests:
- 'resource tracking for 105 pods per node \[Benchmark\]'
gci-resource1:
image: gci-stable-56-9000-84-2
image: cos-beta-59-9460-20-0
project: google-containers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Projects need to be updated to cos-cloud in this config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps. I thought I updated everywhere. Apologies.

}

restart_kubelet() {
echo "Sending SIGTERM to kubelet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to restart Kubelet during the installation? This kind of dependencies could be error-prone: deploying and managing this nvidia-gpus daemonset depends on Kubelet, but in the middle of running the daemonset (installer.sh), Kubelet is restarted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary for the kubelet to pick up the GPUs. Kubelet cannot support hotplugging GPUs for various reasons. We tried that with just PCI based data and it's proving to be hard.
Ideally, we need to reboot the entire node, but we haven't gotten there yet.

k8s-github-robot pushed a commit that referenced this pull request May 25, 2017
Automatic merge from submit-queue (batch tested with PRs 46299, 46309, 46311, 46303, 46150)

Fix cos image project to cos-cloud.

Addressed #45136 (comment).

@vishh @yujuhong @dchen1107
k8s-github-robot pushed a commit that referenced this pull request Jun 3, 2017
Automatic merge from submit-queue (batch tested with PRs 41563, 45251, 46265, 46462, 46721)

change kubemark image project to match new cos image project

The old project is not available anymore.

#45136
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants